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Abstract 


We establish upper and lower bounds for the metric entropy and bracketing entropy of 
the class of d-dimensional bounded monotonic functions under norms. It is interesting 
to see that both the metric entropy and bracketing entropy have different behaviors for 
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1 Introduction 


Shape constrained functions appear very commonly in nonparametric estimation in statistics 
via renewal theory and mixing of uniform distributions. A class of multivariate functions 
of interests in applications is the class of “block-decreasing” densities; see e.g. Polonik HU, 
HU, and Biau and Devroye pQ. It consists of bounded densities on that are decreasing in 
each variable. We denote by Td the collection of non-negative functions on [0,1]*^ which are 
bounded by 1, and monotonic in each variable, that is, monotonic along any line that is parallel 
to an axis. As is well known, the rate of convergence of nonparametric estimators such as the 
Maximum Likelihood Estimator (MLE) is determined by the metric entropy and bracketing 
entropy bounds for an appropriate related class of functions; see the definitions below. 

In this paper, we provide upper and lower bounds for the entropy log N{e,J^d,\\' lip) the 
bracketing entropy logiV[](e,.Frf, || • ||p), where N{s,J^d, II ■ Up) and A’[](e,.Frf, || • ||p) are defined 
as follows: 

{ m 

m : 3/i,/ 2, ... ,/m s.t. C |J Bp{fk,e) 

k=l 

where Bp{fk,e) = {f £ Bd : \\f - fk\\p < e}, and 

r _ 

N[]{e,Bd, II • lip) := min m : 3/^,/i,...,s.t. ||/fc -/^||p <e,BdC IJlZfc’/fc] 

I k=l 

where 

ilkJk] = {9 ^ ^d- l^< g <lk} ■ 

The new bracketing entropy bounds have implications for the rate of convergence of the Max¬ 
imum Likelihood Estimator of a “block decreasing” density as will be shown in section 5. 

Our main result is the following 

Theorem 1.1. For p > 1, there exist constants ci and C 2 depending only on p, such that if 
{d — l)p / d, then 



cie " < logN{e,Bd, || • ||p) < logN[]{e,Bd, || • ||p) < C 2 e 
where a = max{d, {d — l)p}. If {d — l)p = d, then 
(I) < logN{e,Bd, || • ||p) < logiV[](e, || • ||p) < C 2 e“'^(log 

Remark 1.2. We believe that in the critical case (d — l)p = d, the logarithmic factor in the 
upper bound in o is not needed, and prove in Theorem 4.1 that this is indeed so for regular 
entropy under the norm, provided {d,p) / (2,2). 

It should be pointed out that when d = 1, Bd is just the class of probability distribution 
functions, and the entropies are known to be of the order £~^; see e.g. HU, Theorem 2.75, 
page 159. So, in some sense, the results in this paper generalize the known results for d = 1. 
It should also be noted that when d > 1, is a much larger class than that of d-dimensional 
probability distributions. Indeed, Blei, Gao and Li [7j recently proved that under the norm, 
the metric entropy of the class T>d of d-dimensional probability distributions satisfies 

ci£-Mlog(l/e)]"-V2(loglog(l/e))-V2 < iogN{E,Vd, || • II 2 ) < C2e-^[log{l/s)f-^/\ 
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for d > 2, and 


cie ^[log(l/e)]^/^ < logiV(e,i:>rf, II • II2) < C2e ^[log(l/e)]^/^. 


for d = 2. 

The paper is organized as follows. First, we prove the lower bound for regular entropy by 
constructing a well-separated set using a combinatorial argument. Next, we obtain the upper 
bound for bracketing entropy using a constructive proof, revealing the difference of entropy 
growth between the cases p < d/{d — 1) and p > d/{d — 1). Then we turn to the critical 
case p = d/{d — 1), and use the result for the case p = 1 and the metric entropy estimate of 
convex hulls to remove the extra logarithmic factor in the upper bound for the regular entropy. 
Finally, we apply the bracketing entropy estimate to establish a global rate of convergence of 
the MLE of a d-dimensional “block-decreasing” density. 

2 Lower bound 

In this section, we obtain the lower bound estimate, namely 
Proposition 2.1. For p > 1, there exists a constant ci > 0 such that 

\ogN{e,J^d, II • lip) > cie“". 


where a = max{d, {d — l)p}. 

Proof. For convenience, we assume e = 2“” for some positive integer n. We divide [0, l]'^ into 
^-d cubes of side-length e. Define g on [0,1]'^, such that on each open cube kie+ 

e), 0 < ki < 2"^, I < i < d, 

_ {ki + k2 -\- • • • + kd -\- 1)^ I £ 

~ M U 

Clearly, there are 2^ different ways to define g, and each can be extended to a function in 
Td- Let Gd be the collection of these extended functions. 

For each g G Qd define 

B{g) = {h G Gd ■ there are at most open cubes on which g ^ h}. 

Since (Y) < {me/lY and < 2^/^, it is easy to check that B{g) contains no more than 

( 2 L 4 ^_d) < 2^ elements. Thus, we can find N = 2^ functions gi, g 2 , ■■■, gN-, such that 
if i / j, then B{gi) and B{gj) are disjoint. Clearly 

e 1 e 
\\9i-gj\\i ^ ^ ^ 

Hence, N{{48d)~^£, J^dy II • 111) > 2*^ which implies 

N{e,J^d, II • lip,) > N{e,J^d, || • ||i,) > 

for some constant ci > 0 and all p > 1. 

When p > d/{d — 1), this lower bound is not sharp. In order to improve it, we will 
construct a different well-separated subset. We define q{x) on [0,1]*^ as follows: on each open 
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cube that satisfies ki + k 2 + ■ ■ ■ + kd = s /ci, A: 2 ,..., kd > 0, we define 

q{x) = ^ ± Clearly, q{x) can be extended to a function in Td- Now, because there are ce^~^ 
qualified cubes, where c is a constant depending only on d, there are different functions 

q{x). The same combinatorial argument as the one given above shows that there are at least 
m = 2^’^ functions qi, q 2 , ■■■, q-m, such that \qi — qj\ = 1 on at least ce^~^j2^ cubes, i ^ j- 
Thus, 

II II /ce\i/p 

Ik* “ QjWp ^ j 

This implies that 
which further implies 

Nie,Td,\\-\\p)>e^^^~'"~"'\ 

for some constant ci > 0 when p > d/{d — 1). □ 

3 Upper bound 

In this section, we obtain an upper bound through a constructive proof. We will prove 
Proposition 3.1. For p > 1, p ^ d/{d — 1), there exists a constant C 2 > 0 such that 

logA^[](e,JCi, II • lip) < C2e"", 

where a = max{d, (d — l)p}. For p = d/{d — 1), there exists a constant C 2 > 0 such that 

logiV[](e,JCi, II • lip) < C2e“‘^(logl/e)^+"'/^’. 

3.1 Construction 

For convenience, we introduce the notion 

Lv{f, I) = sup{/(t) :te 1} - inf{/(t) :te I}, 
where I is any subset of [0, l]*^. 

If p = 1, we choose K = 2*^; otherwise, we choose K = 2^^ where /3 = ^[d — I + l/(p — I)]. 
For any given e = 2~^, n G N, let I be the integer satisfying K~^ < e < 

For each / G Td, we construct / and / as follows. First, we partition [0,1)*^ into e~^ cubes 
of side-length e. (All the cubes are of the form 0^=1 kok)-) A cube Iq of side-length e is 
selected if ti;(/, Iq) < Ke. For each cube that is not selected, we partition it into 2^^ cubes of 
equal size. In general, suppose we have a cube !{ of side-length 2“*e. If u>{f,Ii) < we 

select the cube; otherwise we partition the cube into 2^ smaller cubes. This process continues 
until i = 1. In this case, we always select the cube. Clearly, each point in [0,1)'^ uniquely 
belongs to one of the selected cubes. 

On each selected cube I of side-length 2“*e, 0 < f < Z, we define 

^ inf^^ifix) - r^i+ijsup^^i fix)' 

L-^ ^ [ iF*+ie J ’ ^ ^ K-+^e 

On each selected cube of side-length 2“k and on [0,1]*^ \ [0, l)'^, we define / = 1 and / = 0. 
Clearly, / < / ^ /. _ 

Let 5 = {/ : / G J^d}-, and 5 = {/ ; / G Td}- We will estimate ||/ — /||p, and the 
cardinalities |5| and |5| of 5 and S respectively. 
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3.2 Bound for II/-/lip 

For each i G N, let C/ be the union of the selected cubes of side-length 2“*e. We first bound 
the measure of [/. 

Let Si be the number of cubes of side-length 2“*e that have been selected, and n* be the 
number of cubes of side-length 2“*e that have not been selected. Clearly, by the construction 
of / and /, we have Si + rii = 2'^ni_i. In particular, Sj < 2'^nj_i. 

Now we try to estimate rii-i for i > 1. If a cube I = \^j=i[ajihj) of side-length is 

not selected, then Lv{f, I) > K^e. By the monotonicity of / along each variable, there exists 
1 < J < d, such that on the edge Aj_iAj, we have (jj{f, Aj_iAj) > K^e/d, where 

Aj = (6i, ..., , Oj-l-l, 

Thus for rii-i cubes of side-length 2“®+^e, there are n^-i disjoint edges on which •) > 
K^e/d. From these edges, there are at least \ni-i/d\ edges that are parallel. Furthermore 
from these parallel edges, there are at least disjoint edges that lie on the 

same line segment [0,1] that is parallel to one of the axes. Because / is monotonic along this 
line segment, and the value change is at most 1, we have 

Thus, m-i < 

Therefore, for 1 < i < I, the measure of Ui is bounded above by 
Si-(2-V)'' < 2''ni_i • (2-V)'' 

= 2d^{2K)-\ 


For i = 0, the measure of Uq is trivially bounded by 1. 

Recall that for 0 < i < /, |/ — /| < 2K^^^£ on [/. Also, on Ui, we have |/ — /| < 1. Thus, 


i-i 


\\f-L\\p = f i/-/r+E/ i/-/r+/ i/-/r 

JUo JUi JUi 

1-1 

< {2K£Y + '^{2K^+^£f ■ 2d^ {2K)-^ + 2d? {2K)-^ 


i=l 


l-l 


( 2 ) 


< 


{2K£Y + 2V+^KPd^Y^ 


2=1 


KP 


-i\ * 


£P + 2d^ {2K)-\ 


When {d - l)p < d, we have d-1 < ? < ^. So, AT = 2^ < 2 V(p-i). Thus, KP-^/2 < 1, 


and ^ < K P. Therefore 


/vp-i 

-l\\P < {2K£Y + 2P+^KPd^ • _ £P + 2d^ • 


< 


2 - KP- 
KP-^ 


{2KY + 2P+^KPd^ ■ + 2/ 


(3) 


< C£^ 
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for some constant c depending only on p and d, where in the second inequality we used the 
fact that K~^ < e. 

When {d — l)p > d, we have d — I > P > So, K = 2^ > , that is KP~^/2 > 1. 

Hence, 


Wf-LW^P < 
< 

< 

< 

< 


{2Ks)P + 2P+^KPd^ ■ • ( 2 ^)"' 

op+l K'Pd'^ 

+ ^p-i/2 -1 • 

{2KeY + c{2K)-^ 

{2K)PeP + ce^+^/^ 

c'e^+V/t, 


-i 


for some constants c,c'>0 depending only on p and d, where in the third and fourth inequal¬ 
ities we used the fact 1 < < K and in last inequality we used the fact that p > 1 -|- 1//3. 

When (d — l)p = d, we have KP~^ = 2, So, we obtain from 0 that 

Wl-lFp < i2Ks)P + 2P+^KPd^{l-l)sP + 2d^{KP)-^ 

< ce^ log 1/e, 

for some constant c > 0 depending only on p, where in the last inequality we used the fact 
that 1 < K^e < K. 


Summarizing, we obtain that 

(4) Il7-/llp<< 

3.3 Bounds for |iS| and |5| 

We derive the upper bound for |5|. The argument for bounding |5| is almost identical. 

Because all the selected cubes of side-length e are chosen from uq = cubes, there are 
no more than 2^ different ways of selecting cubes of side-length e. For 1 <i <1, the selected 
cubes of side-length 2“*e are chosen from the rii-i cubes of side-length 2“*+^e that were not 
selected in the previous step, there are no more than 2^ different ways to select the cubes 
of side-length 2“*e. Once the cubes are selected. For each Q < i < I, the Sj selected cubes of 
side-length 2“*e can be grouped into no more than rows. Suppose row-j contains 

Tj selected cubes. Because the values of / on these Vj cubes are in monotonic order, and are 
all chosen from 0, K^'e, 2K^e, ... mK'^e, where m = \_K ~''£~^\, the number of different ways of 
assigning values of / on these Vj cubes is bounded by 

^ < max{exp(crj), exp(c/C“*e“^)} < exp{crj) ■ exp{cK~^£~^). 

Thus, the number of different ways to assign the values of / on the s* selected cubes of side- 


V [K-i£-^\ + 1 


c£ {d — l)p < d 

ce(log l/e)^/P {d — l)p = d 
i±l 

c£ pi^ {d — l)p > d 
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length 2 *e is bounded by 


(2i£-l)d-l 

(exp(crj) • exp(cK“*e“^)) < exp(csj) • exp 

i=i 

< exp (^c\2^-^K-ye-^^ , 

where in the inequality above, we used Si < 2'^nj_i, and the estimate rii-i < 
obtained in §3.2. 

Hence, the total number of realizations of / is bounded by 


(5) 2^ 


i-i 

n§ 

i=l 


1-1 


,2‘^ni-i 


■ exp 




,^^-d 


< exp 


'E(2' 

i=0 




y^-d 


where in the last inequality we again used the estimate nj_i < *e 

When {d - l)p > d, > 2^ = K, we can bound the right hand side of © by 

exp (c'"[2'^-Vi^]'e"^) < exp (^c'''£-(d+i)id-i)/d^ _ 

When {d — l)p = d, the upper bound of the right hand side of © can be bounded by 
exp {c"'£~‘^ log 1/e). 

When {d — l)p < d, 2'^~^/K < 1, and the upper bound of the right hand of © is bounded 
by exp(c"'e“'^). 

Summarizing, we obtain 



r c"'e-<^ 

{d — l)p < d 

(6) 

log 5 < < c"'e“'^log 1/e 

[d — l)p = d 


y ^///^-(/3+i)(d-i)//3 

{d — l)p > d 

3.4 

Proof of Proposition 3.1 



Combining 0 and ©, we have 


{ ce {d — l)p < d 

ce“'^(log 1/ey^‘^/P {d - l)p = d 

C^-(d-l)p ^ ^ 


for all e = 2 n G N. The monotonicity of bracketing numbers implies that Proposition 3.1 
holds for all e < 1. 


4 Critical Case 

We believe that the logarithmic factor in Theorem II .1 M s not needed. In this section, we prove 
that if we only consider the regular entropy, then when {d,p) ^ (2,2), the logarithmic factor 
can indeed be removed. 
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Theorem 4.1. For {d,p) / (2, 2), there exist constants ci, C 2 depending only on p and d such 
that, 


cie “ < logN{s,J^d, II • lip) < C 2 e " 


where a = max{d, {d — l)p}. 

Proof. In view of Theorem 1.1, it remains to show the upper bound for the case {d — l)p = d, 
d > 2. Let 


T = {1a : A = {{xi,X2,...,Xd) : fixi,X2, ...,Xd) < A},0 < A < 1,/ G 


Then clearly J^d is the closed convex hull of T, that is J^d = conv(T). 

For any 1a G T, there exists f ^ iFd, and 0 < A < 1 such that 

A = {{xi,...,Xd) : fixi, ...,Xd) < A}. 

By otherwise changing variable ti = 1 — Xi, we can assume that / is non-decreasing with respect 
to every variable Xi, 1 < i < d. Define /a on [0,1]'^“^ as follows: 



It is easy to check that /a G J^d-i- Furthermore, for all 1a, 1b £ T, ||1a —Isllp = ||/a — /s|li^^- 
Thus, 


lV[](e,r,||.||p) = iV[](e^T-rf_i, 


Therefore, by applying Proposition 3.1 for J-d-i with p = 1, we have 


log iV(e,T, II • lip) < logiV[](e,r, || • ||p) < 


Recall a general theorem of [0] (see also jHj) that 

logiV(e, conv(5)) = 

whenever log A^(e,S') = 0{£~^) for a > 2. Applying these results we obtain 


logA^(e,JFrf, II • lip) = loglV(e,conv(r), || • ||p) < ce 
for (d — l)p = d> 2. 

When {p, d) = (2, 2), we have {d — l)p = 2. It was proved in ^0] that 

log A(e,conv(5)) = 0(e“^(log 1/e)^) 


□ 


whenever log A^(e, S) = 0(e“^), and in general, this cannot be improved. Note that this bound 
is exactly the bound we obtained earlier using a direct construction. Thus, in this case, using 
convex hulls does not improve the estimate. 


5 Rates of convergence for the Maximum Likelihood Estimator 
of a block decreasing density 

Biau and Devroye ^ showed that the minimax rate of convergence for estimating a bounded 
block decreasing density with Li risk is and constructed histogram estimators that 

attain this rate. Here is a more precise description of their result. Let Tb denote the class 
of all block decreasing densities on the unit cube [0, bounded by B. Define the risk of the 
estimator fn when the true density is / G H by 


Rifn, f) =Efl. [ \Jn{x) - /(x)| dx 
I JR'* 


and the maximum (or “worst case”) risk by 

nLEs) = sup RifnJ). 

The minimax risk is IZniEs) = inf^ Tl{fn, Bb)- CQ showed that for some constants Ci and 


^ 2 , 


Rn{EB) > C2 


(¥) 


l/(d+2) 


mmlb _ 
n 


where S = log(l + B). The resulting minimax lower bound rate of convergence is r, 
^i/( 2 +d) _ ^ constructed generalizations of the histogram 

estimators of Birge which achieve this rate of convergence. 

The MLE of a decreasing density on [0, M] is well known to be with respect to Hellinger 
and Li metrics: see Birge Although the MLE of a block decreasing density has been 

initiated by Polonik m, the rate of convergence of the MLE in this setting with respect to 
Hellinger or Li metrics is apparently unknown for d >2. It is known from Birge and Massart 
(see also m, pages 326-327 together with Theorem 3.4.1, page 322) that maximum likelihood 
estimators have a rate of convergence of at least when the bracketing entropy with 

respect to the Hellinger metric h of the class of densities V satishes 


(7) 


K 


logN[]{e,r,h) <—, e >0 


with 7 < 1 / 2 ; here the Hellinger distance h{P, Q) is given by h?{p, q) = J[\/p — ^/q]^dpL where 
pL is any measure dominating both P and Q and p, q are the densities of P, Q with respect to 
p. From the results of ^ it might be guessed that 0 holds for P = Pb with I /7 = d, and 
this would lead to the rate of convergence for the MLE when d > 2. Our theorem 

1.1 suggests that the rate of the convergence of the MLE (with respect to Hellinger distance) 
is still slower than this for d > 2, as is shown in the following proposition. We suppose that 
Xi,... ,Xn are i.i.d. / G Pb- 

Proposition 5.1. Suppose that fn is the MLE of a block decreasing density / on [0, l]'^. Then 
if d > 3 


( 8 ) 

If d = 2, then 
(9) 


n4{<i-i)/i(/„,/) = Op{l). 


n 


1/4 


logn 


h{fn,f)=Op{l). 
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Proof. We use the results of Birge and Massart as presented in section 3.4 of m- From 
Theorem 3.4.1, page 322, with taken to be 

V = {p a. block-decreasing density on [0,1]'^ bounded by B} 


it follows that we need to establish the inequalities of the first display of page 323. These 
follow from Theorem 3.4.4, page 327, for the Bellinger distance h by choosing pn = Po and 
taking Vn = B: the resulting bound for ||G,i||;n^ with 

Ms = {rrip = log : p€V} 

Po 


is of the form 


( 10 ) + =MS) 

where ^ 

J[] (5, r, h) = ^l + logiV[](e,lP,/i)de 

in view of the discussion on page 326 and 0, Theorem 1, page 118. Since y/p is block-decreasing 
with bound sfB if p is block-decreasing with bound B, it follows that 


logA^[](e,lP,/i) = logiV[](e,lpi/2^ || • lb) = \ogNy^{e/VB,V^/^/VB, || • lb) 

where || • |b is the L 2 norm (with respect to Lebesgue measure A) and where is the class 
of block - decreasing functions with bound \bB, and hence is the class of block - 

decreasing functions with bound 1. Thus for d > 3 we calculate, using Theorem 1.1 withp = 2, 


J[](d,P,h) 


< 


< 


L 

L 


5 


-^1 -FlogA^[](e,P,/i)(ie 
^1 + log iV[] (e/\/B, 


i)de 


\/l + C2B‘^-^€ i)de d > 2 

fcS2 + C2Be-2(log l/e)2de d = 2 


I d-2('^-2) d>2 

I (logl/d)^ d = 2 


where f(x) < g(x) means f(x) < Kg{x) for some constant K. Plugging this into (|Tni) yields 

/ x- 2 (d- 2 ) \ 

bn(5) 1 + ^ 2 ^J ford >2, 

bn(5) = (log(l/(5)^ ^1 -F d = 2. 

It is easily verified that when d > 2, r2bn(l/rn) < y/n if . When d = 2, 

r'^cfni^/fn) ^ 'Jn if Tn = n4/logn. Thus the rate of convergence of the MLE is at least 
1 1 

n'Kd^ for d > 2, and / log n for d = 2. □ 
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